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Introduction 

We  have  previously  reported  that  a  region  on  chromosome  19pl3  displays  extremely 
high  rates  of  loss  of  heterozygozity  (LOH)  (ref).  LOH  is  a  hallmark  for  the  existence  of 
tumor  suppressor  gene.  We  proposed  to  identify  potential  candidates  through  the  use  of 
cDNA  microarrays.  Therefore  we  would  spot  cDNAs  from  genes  mapping  to  the  high 
LOH  area  onto  chips,  and  hybridize  them  with  genomic  DNA  from  breast  tumors.  The 
same  arrays  could  be  used  in  a  subsequent  study  to  measure  RNA  expression  of  genes 
from  the  high  LOH  area.  In  studies  performed  outside  this  proposal,  we  would  analyze 
potential  candidates  through  functional  studies. 

Body 

A  first  step  to  perform  the  proposed  research  was  to  generate  a  conclusive  list  of  ESTs 
mapped  to  the  region  of  LOH  on  chromosome  1 9p  1 3 .  During  the  last  year  a  lot  of 
progress  was  made  through  the  Human  Genome  Project  (www.ncbi.nlm.nih.gov)  and 
through  the  Lawrence  Livermore  Lab  in  sequencing  chromosome  19. 

Using  the  available  sequence  information  we  created  a  list  of  120  unique  cDNAs  which 
we  were  interested  in  spotting  onto  arrays.  In  addition  we  listed  genes  which  are  known 
to  be  lost  or  amplified  in  breast  cancer,  which  would  serve  as  appropriate  controls.  The 
majority  of  those  clones  were  purchased  from  our  Microarray  Facility  here  at  Baylor 
College  of  Medicine  (they  originally  purchased  these  clones  from  Research  Genetics). 
The  bacterial  clones  were  grown  up,  and  DNA  was  amplified  using  M13  forward  and 
reverse  primers.  The  PCR  products  were  run  on  1%  agarose  gels  (Figure  1).  The  clones 
which  were  not  available  from  our  Core  Facility  were  purchased  from  Research  Genetics. 

A  number  of  clones  gave  more  than  one  PCR  product  which  we  could  not  eliminate 
through  optimization  of  our  PCR  conditions.  Those  clones  were  restreaked,  and  the 
analysis  of  more  colonies  revealed  that  a  number  of  clones  contained  more  than  one 
clone.  Also,  some  clones  did  not  result  in  any  PCR  products.  And  finally,  upon 
sequence  analysis,  we  realized  that  yet  other  clones  did  not  contain  the  appropriate 
cDNA.  In  summary  we  have  had  various  problems  with  approximately  30%  of  the 
cDNA  clones.  These  problems  resulted  in  significant  time  loss  due  to  additional  work 
needed  to  obtain  all  necessary  PCR  products  for  the  array. 

Similar  problems  were  recently  published  as  a  "News  Feature"  in  Nature  (ref).  The 
authors  describe  that  sequence  analysis  of  1,289  IMAGE  clones  from  Research  Genetics 
revealed  that  only  62%  of  the  stocks  definitely  represented  a  pure  sample  of  the  correct 
clone. 

Key  research  accomplishments: 

-Generated  conclusive  list  of  ESTs  covering  the  high  LOH,  and  controls  for  LOH  and 
amplification  in  breast  cancer 

-  PCR  amplified  approx  200  cDNAs  for  the  custom  array  after  overcoming  problems 
with  IMAGE  clones 
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Reportable  Outcomes 

n/a 

Conclusions 

Troubleshooting  resulted  in  the  realization  that  a  significant  number  of  IMAGE  clones 
are  problematic  at  least.  Despite  these  issues  which  are  obviously  causing  problems  for 
the  whole  scientific  community  we  now  have  a  set  of  cDNAs  of  which  most  of  them  are 
correct,  i.e.  contain  the  correct  sequenced  gene  of  interest.  We  are  currently  in  close 
contact  with  Dr.  Lisa  White  the  new  Director  of  the  Baylor  College  of  Medicine  Array 
Facility  to  print  the  cDNA  arrays.  We  conclude  that  due  to  uncontrollable  circumstances 
it  took  us  longer  than  expected  to  generate  the  complete  set  of  PCR  products  for  the 
chromosome  19pl3  cDNA  array.  However  we  have  now  generated  this  valuable  tool, 
and  will  soon  be  able  to  screen  breast  tumors  for  loss  of  specific  genes.  We  requested  and 
was  granted  a  1  year  extension,  and  are  confident  that  we  will  have  generated  array  data 
by  the  end  of  the  proposal. 
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are  down 


i  DNA  microarrays  are 
transforming  sti  dies 
of  gene  expression.  But 
some  of  the  biologists 
j  flocking  to  exploit  this 
I  powerful  technology 
I  are  not  aware  of  its 
i  potential  pitfalls. 
Jonathan  Knight  relates 
a  cautionary  tale. 

Some  call  them  DNA  chips,  others 
microarrays,  but  whatever  name  you 
prefer,  they  are  one  of  the  hottest  tools 
in  biology.  A  search  of  the  Medline  database 
for  papers  published  in  1999  with  ‘microar¬ 
ray’  in  their  title  yields  just  27  results.  Try  the 
same  search  for  2000  and  the  number  jumps 
to  97  —  a  crude  measure,  perhaps,  but  it  is  a 
testament  to  a  revolution  that  is  transform¬ 
ing  studies  of  gene  expression.  As  the 
genomics  revolution  begins  to  make  its 
mark,  biologists  are  turning  in  growing 
numbers  to  a  technology  that  lets  them 
analyse  cells  or  tissues  and  determine,  at  a 
stroke,  which  genes  are  active. 

DNA  microarrays  consist  nf  a  library  of 
genes  immobilized  in  a  grid,  usually  on  a 
glass  slide.  Each  individual  ‘spot*  in  the  grid 
contains  DNA  from  a  single  gene  that  will 
bind  to  the  messenger  RNA  fmRNA)  pro¬ 
duced  by  the  gene  concerned.  So  by  liquidiz¬ 
ing  a  sample  from  a  given  tissue  type,  tagging 
its  mRNAs  with  fluorescent  dyes  and  then 
exposing  the  sample  to  the  slide,  it  is  possible 
to  obtain  an  instant  visual  read-out  revealing 
which  genes  were  active. 

Researchers  who  previously  studied  the 
activity  of  one  gene  at  a  time  can  now  analyse 
the  expression  of  thousands  of  genes  simul¬ 
taneously.  But  as  aficionados  explore  the 
technology’s  limits,  they  are  turning  up 
errors  in  DNA  chips  that  could  lead  unwary 
biologists  towards  erroneous  conclusions. 
And  experts  worry  that  too  few  of  the 
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researchers  rushing  to  embrace  DNA 
microarrays  are  aware  of  the  potential  pit- 
falls.  “It’s  going  to  revolutionize  science.  But 
the  technology  is  in  its  infancy,  so  there  are 
going  to  be  some  growing  pains,”  says  Tim¬ 
othy  Zacharewski,  a  toxicologist  at  Michigan 
State  University  in  East  Lansing,  who  makes 
and  uses  microarrays.  “It’s  amazing  how 
many  people  are  going  forward  without  a  full 
appreciation  of  what  they  are  getting  into.” 

The  enormous  number  of  genes  that  can 
be  studied  at  one  go  is  the  technology’s  curse, 
as  well  as  the  source  of  its  power.  Although 
microarray  production  is  heavily  automat¬ 
ed,  there  are  many  opportunities  for  human 
error.  “For  any  experiment  you  can  mislabel 
a  tube  and  mess  yourself  up,”  says  Joseph 
DeRisi,  a  microarray  pioneer  at  the  Universi¬ 
ty  of  California,  San  Francisco.  “But  here,  the 
potential  for  the  error  to  magnify  itself  is 
much  more  drastic.  Instead  of  one  tube  at  a 
time,  you  are  doing  6,000.” 

Send  in  the  clones 

One  popular  type  of  array  was  devised  by  a 
team  led  by  Patrick  Brown  at  Stanford  Uni¬ 
versity  in  California1,  and  is  based  on 
libraries  of  gene  sequences  made  using 
mRNA.  To  store  and  reproduce  these 
sequences,  researchers  make  ‘complemen¬ 
tary’  DNA  (cDNA)  copies  of  the  RNA  mes¬ 
sages  and  splice  them  into  loops  of  DNA 
called  plasmids.  The  plasmids  are  then 
inserted  into  bacteria,  which  grow  in  cul¬ 


Power  tools:  microarrays  (left)  show  quickly  and 
easily  which  genes  in  a  sample  are  active,  but 
despite  automated  steps  in  their  manufacture, 
the  chips  are  still  open  to  significant  errors. 

ture  and  churn  out  more  plasmids  from 
which  the  cDNAs  can  be  derived  for  spot¬ 
ting  onto  microarray  slides. 

Errors  creep  in  as  these  bacterial  cultures, 
or  the  cDNA  clones  extracted  from  them,  are 
manipulated.  The  cultures  are  often  stored  in 
small  plastic  plates,  each  typically  containing 
96  wells,  and  they  are  transferred  from  plate 
to  plate  using  pipetting  robots.  But  bacteria 
can  easily  contaminate  other  wells,  and  tech¬ 
nicians  can  make  errors  such  as  loading 
plates  into  the  robots  the  wrong  way  round 
or  taking  samples  from  the  wrong  well  for 
sequencing.  As  a  result,  between  1%  and  5% 
of  the  clones  in  even  the  best- maintained  sets 
do  not  contain  the  sequence  that  they  are 
supposed  to. 

Until  r  ecently,  few  researchers  were  aware 
of  the  extent  to  which  the  errors  can  multiply 
as  clone  sets  are  copied  and  transferred  from 
lab  to  lab.  But  last  year,  after  hearing  anec¬ 
dotal  reports  of  high  error  rates  in  a  set  of 
mouse  cDNA  clones  assembled  by  a  group  of 
labs  called  the  IMAGE  (Integrated  Molecu¬ 
lar  Analysis  of  Genomes  and  their  Expres¬ 
sion)  consortium,  Zacharewski  decided  to 
investigate  further. 

The  IMAGE  consortium  has  compiled  a 
variety  of  cDNA  clone  sets,  which  are  now 
produced  by  commercial  suppliers.  Scientists 
wanting  to  use  IMAGE  clone  sets  for  micro¬ 
array  studies  can  either  buy  bacterial  cultures 
or  purified  cDNAs  and  make  up  their  own 
slides,  or  order  pre-manufactured  chips. 

To  check  the  accuracy  of  commercially 
available  IMAGE  mouse  cDNA  clone  sets, 
Zacharewski  and  his  colleagues  purchased  a 
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It  cannot  be  assumed 
that  microarrays 
based  on  cDNA 
clones  are  reliable, 


set  from  one  supplier,  Research  Genetics  of 
Huntsville,  Alabama,  and  sequenced  1,189 
cDNAs.  Only  62%  of  the  stocks  definitely 
represented  a  pure  sample  of  the  correct 
clone2.  Of  the  remainder,  more  than  half 
seemed  to  contain  the  wrong  cDNA,  and  the 
rest  contained  either  a  mix  of  different 
cDNAs  or  did  not  yield  a  readable  sequence. 

In  some  cases,  the  apparent  errors  may 
mean  that  the  sequence  for  the  clone 
deposited  in  the  public  databases  is  wrong, 
rather  than  there  being  a  problem  with  the 
done.  But  stocks  containing  more  than 
one  cDNA  were  probably  the  result  of  cross¬ 
contamination,  Zacharewski  says.  Other 
problems  may  reflect  handling  errors  accu¬ 
mulated  as  different  labs  managed  and  dis¬ 
tributed  the  stocks  over  the  years. 

Before  Zacharewski  V,  study,  reagent  sup¬ 
pliers  had  acknowledged  the  potential  for 
errors  and  started  producing  cleaned-up, 
‘sequence-verified5  cDNA  clone  sets.  But 
even  these  can  be  problematical.  Indeed, 
researchers  at  three  major  microarray  cen¬ 
tres  told  Nature  that  they  have  found  dis¬ 
turbingly  high  error  rates  —  up  to  30%  —  in 
copies  of  the  sequence-verified  version  of  the 
Research  Genetics  mouse  cDNA  clone  set 
studied  by  Zacharewski’s  team. 

The  centres  involved  —  at  Vanderbilt 
University  in  Nashville,  Yale  University  in 
New  Haven,  Connecticut,  and  Brigham  and 
Women's  Hospital  in  Boston  —  belong  to  a 
biotechnology  consortium  funded  by  the 
National  Institute  of  Diabetes  and  Digestive 
and  Kidney  Diseases  (NIDDK)  in  Bethesda, 
Maryland.  The  source  of  the  errors  has  yet  to 
be  pinpointed,  and  some  may  have  arisen  at 
the  centres  concerned.  Troy  Moore  of 
Research  Genetics  maintains  that  the  com¬ 
pany's  error  rate  should  not  exceed  2%,  but 
adds:  “If  we  identify  a  problem,  we  will  work 
to  correct  it 

Shawn  Levy,  who  works  at  the  Vanderbilt 
centre,  does  not  believe  the  problems  he  has 
found  within  the  Research  Genetics  clone  set 
are  the  result  of  local  mishandling,  as  his  team 
has  analysed  other  clone  sets  and  found  error 
rates  of  less  than  5%.  Bat  some  of  the  errors 
might  reflect  the  fact  that  cDNA  clones  are 
usually  not  sequenced  in  their  entirety — so  if 
the  fragments  sequenced  by  the  NIDDK  con¬ 
sortium  do  not  overlap  with  the  partial 
sequences  deposited  in  public  databases,  cor¬ 
rect  clones  may  appear  to  be  in  error. 

While  the  consortium  members  compare 
their  sequencing  data  in  an  effort  to  pin  down 
the  source  of  the  apparent  errors,  the  Yale 
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Teething  troubles:  Timothy  Zacharewski  (above) 
was  shocked  by  the  high  error  rates  he  found  in  a 
set  of  cDNA  clones  used  to  make  microarrays. 
Joseph  DeRisi  (right)  is  worried  that  researchers 
cannot  check  for  mistakes  in  commercial  chips. 

centre  has  posted  a  notice  on  its  website 
warning  users  of  the  potential  for  problems. 
But  regardless  of  the  explanation ,  the  lesson  is 
clear:  even  when  care  is  taken  to  remove  erro¬ 
neous  sequences,  it  cannot  be  assumed  that 
microarrays  based  on  cDNA  clones  are  reli¬ 
able.  “I  think  errors  may  be  inherent  to  the 
system,”  says  Steve  Gullans,  who  heads  the 
centre  at  the  Brigham  and  Women's  Hospital. 

As  a  result,  the  NIDDK  consortium  plans 
to  increase  its  output  of  micro  arrays  ba  sed 
on  a  rival  technology.  In  these  chips,  the  grid 
consists  of  oligonucleotides,  or  oligos  — 
short,  single-stranded  DNA  segments  built 
to  order  by  chemical  synthesis1.  This  con¬ 
struction  process  avoids  problems  with  bac¬ 
terial  contamination,  and  should  mean  that 
each  sequence  is  what  the  researcher  orders. 
On  the  minus  side,  oligo-based  microarrays 
are  expensive.  And  ultimately,  they  are  only 
as  good  as  the  information  used  to  direct 
the  oligos5  synthesis — as  the  DNA  chip  com¬ 
pany  Affymetrix  of  Santa  Clara,  California, 
recently  discovered. 

Mistaken  identity 

Affymetrix  can  pack  up  to  400 , 000  different 
oligos  on  a  single  array  —  usually  repre¬ 
senting  around  10,000  genes,  with  40  oligos 
for  each  gene.  But  in  February,  Affymetrix 
announced  that  up  to  a  third  of  the 
sequences  on  one  set  of  mouse  arrays  were 
wrong.  The  company  had  used  sequences 
from  the  public  sequence  databases  that 
were  known  to  be  ambiguous,  and  which 
actually  corresponded  to  the  wrong  strand 
from  the  DNA  double  helix.  As  a  result,  the 
oligos  could  not  detect  their  target  mRNAs. 

Affymetrix  has  promised  to  replace  the 
arrays.  “It’s  going  to  be  an  inconvenience,  at 
most,”  says  Carrolee  Barlow  of  the  Salk  Insti¬ 
tute  for  Biological  Studies  in  La  Jolla,  Caliror- 
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nia,  who  is  using  the  chips  to  investigate  the 
genetics  of  brain  disorders.  But  to  DeRisi,  the 
incident  points  out  the  risks  inherent  in 
commercial  DNA  chips.  “You  are  at  the 
mercy  of  the  company,'5  he  says.  “That  is  a 
tough  situation  when  you  are  not  allowed  to 
proofread  what  they  have  done.” 

But  even  perfect  arrays  do  not  guarantee 
good  science.  Microarray  experts  say  that 
some  new  users  seem  to  be  so  mesmerized  by 
the  technology’s  power  that  they  are  forget¬ 
ting  basic  principles  of  experimental  design. 
Ash  Aiizadeh,  a  graduate  student  in  Brown’s 
Stanford  lab,  say s  he  knows  of  several 
microarray  studies  lacking  the  pi*oper  con¬ 
trols  and  replications  needed  to  ensure  that 
differences  in  gene  expression  really  are  asso¬ 
ciated  with  the  variable  under  investigation. 

Although  such  shortcomings  should  be 
spotted  by  journal  editors  and  reviewers, 
erroneous  results  caused  by  faulty  chips  are 
harder  to  detect  —  and  experts  are  sure  that 
some  have  entered  the  literature.  They  are 
urging  users  not  to  draw  firm  conclusions 
about  the  activity  of  individual  genes  with¬ 
out  checking  the  sequence  of  the  spot  con¬ 
cerned  and  verifying  the  result  using  alterna¬ 
tive  methods  of  monitoring  gene  expression. 

Within  a  few  months,  predicts  Gullans, 
journal  reviewers  will  routinely  be  asking 
these  questions.  And  then  perhaps  the  focus 
will  be  back  on  the  immense  power  of 
microarrays,  rather  than  their  limitations.  ■ 
Jonathan  Knight  writes  for  Nature  from  San  Francisco. 
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